ggplot2
::ggplot() flyovercolor vs. fill:: function to reference functions
inside packages.
## # A tibble: 6 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## # A tibble: 6 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 5 0.2 Premium E SI2 60.2 62 345 3.79 3.75 2.27
## 6 0.32 Premium E I1 60.9 58 345 4.38 4.42 2.68
ggplot() statementgeom_[style]() such as
geom_point()geom_bar()geom_boxplot()geom_density()geom_vline()geom_segment()geom_histogram()aes( )aes (except facets, later in
slides).+ symbol
%>% between layers of
ggplot2 graphicsggplot2 portions, instead of
“and then” with %>% syntaxSteps 4 and 5 can be switched.
Let’s look at our BabyNames names data set agian.
## Rows: 1,792,091
## Columns: 4
## $ name <chr> "Mary", "Anna", "Emma", "Elizabeth", "Minnie", "Margaret", "Ida"…
## $ sex <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F",…
## $ count <int> 7065, 2604, 2003, 1939, 1746, 1578, 1472, 1414, 1320, 1288, 1258…
## $ year <int> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880…
names <- c("Olivia", "Zoe", "Quentin")
Names <-
BabyNames %>%
filter(name %in% names) %>%
group_by(name, year) %>%
summarise(total = sum(count, na.rm = TRUE))
Names %>%
head()## # A tibble: 6 × 3
## # Groups: name [1]
## name year total
## <chr> <int> <int>
## 1 Olivia 1880 44
## 2 Olivia 1881 51
## 3 Olivia 1882 52
## 4 Olivia 1883 46
## 5 Olivia 1884 54
## 6 Olivia 1885 59
This isn’t easy to read, and it’s in bad form.
ggplot(data = Names, aes(x = year, y = total)) + geom_line() + aes(colour = name) + theme(legend.position = "right") + labs(title = "")Nothing is here! That is exactly what is supposed to happen. Calling
ggplot() only tells us R that we are ready to plot and I
want to call some space to create my plot.
Still Nothing! We need to tell it what our axis are.
Note that ggplot uses +, NOT %>%. This
is because we are adding layers to our plots.
## Error in `check_required_aesthetics()`:
## ! geom_line requires the following missing aesthetics: x and y
Note - this is why I like to mape aesthics first, so we can avoid errors.
Rule of thumb: anytime when you are plotting with ggplot, ALL
variables need to be inside an aes (except facets, later in
slides).
ggplot(data = Names) +
geom_line( aes(x = year, y = total, color = name)) +
ggtitle("Names Over Time") +
xlab("Year") +
ylab("Popularity") +
guides(color = guide_legend(title = "Siblings Names" ))ggplot(data = Names) +
geom_line( aes(x = year, y = total, color = name, linetype = name)) +
ggtitle("Names Over Time") +
xlim(c(1972, 2022))+
xlab("Year") +
ylab("Popularity") +
guides(color = guide_legend(title = "Siblings Names" ),
linetype = guide_legend(title = "Still Siblings Names" ))## Warning: Removed 252 row(s) containing missing values (geom_path).
facet_wrap()The syntax for facets requires a formula syntax we haven’t seen much yet. Also, there are two main ways to plot with facets. Here are a few pointers:
facet_wrap() just makes a box for each level of the
categorical variable
facet_wrap( ~ categoricalVariable)data("NCHS")
# 1is.na(smoker) gets cases that are non-missing for `smoker` (i.e. removes NA's)
Heights <-
NCHS %>%
filter(age > 20, !is.na(smoker)) %>%
group_by(sex, smoker, age) %>%
summarise(height = mean(height, na.rm = TRUE))
head(Heights)## # A tibble: 6 × 4
## # Groups: sex, smoker [1]
## sex smoker age height
## <fct> <fct> <dbl> <dbl>
## 1 female no 21 1.60
## 2 female no 22 1.62
## 3 female no 23 1.61
## 4 female no 24 1.62
## 5 female no 25 1.63
## 6 female no 26 1.62
Heights %>%
ggplot(aes(x = age, y = height)) +
geom_line(aes(linetype = smoker)) +
facet_wrap( ~ sex)facet_grid()facet_grid() allows control of row & column
facetsfacet_grid() syntax:
facet_grid(rows ~ cols)facet_grid( rows ~ . ) (note the
required “.”)facet_grid( ~ cols) (no
“.” this time)Heights %>%
ggplot(aes(x = age, y = height)) +
geom_line(aes(linetype = smoker)) +
facet_grid(sex ~ .)color and fill## wage educ race sex hispanic south married exper union age sector
## 1 9.0 10 W M NH NS Married 27 Not 43 const
## 2 5.5 12 W M NH NS Married 20 Not 38 sales
## 3 3.8 12 W F NH NS Single 4 Not 22 sales
## 4 10.5 12 W F NH NS Married 29 Not 47 clerical
## 5 15.0 12 W M NH NS Married 40 Union 58 const
## 6 9.0 16 W F NH NS Married 27 Not 49 clerical
CPS85 %>%
ggplot() +
geom_density(aes(x = wage, color = sex), alpha = 0.4)+
facet_grid( ~ married) +
xlim(0,30) ## Warning: Removed 1 rows containing non-finite values (stat_density).
CPS85 %>%
ggplot() +
geom_density(aes(x = wage, fill = sex), alpha = 0.4)+
facet_grid( ~ married) +
xlim(0,30) ## Warning: Removed 1 rows containing non-finite values (stat_density).
CPS85 %>%
ggplot() +
geom_density(aes(x = wage, fill = sex, color = sex), alpha = 0.4)+
facet_grid( ~ married) +
xlim(0,30)## Warning: Removed 1 rows containing non-finite values (stat_density).
CPS85%>%
ggplot(aes(x = married, color = sex)) +
geom_bar() +
facet_wrap( ~ union, scales = "free") #Note the scales here CPS85%>%
ggplot(aes(x = married, fill = sex)) +
geom_bar()+
facet_wrap( ~ union, scales = "free") #Note the scales here establish the frame
plot the glyphs (i.e., select a geom)
map the aesthetics
add labels and title
other features (e.g., alpha, sizing, etc)
Establish the Frame
## Error in `check_required_aesthetics()`:
## ! geom_point requires the following missing aesthetics: x and y
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point() +
ggtitle("Diamonds Data") +
xlab("Carat") +
ylab("Price")Notice that I can have aes inside multiple statements.
Notice that when I use constants (like
alpha = 0.3, size = 0.1) they ARE NOT inside
aes.
In general, variables go inside aes and constants go
outside of it. (unless we are using facets then see previous
materials.)
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point(aes(colour = depth), alpha = 0.3, size = 0.1) +
ggtitle("Diamonds Data") +
xlab("Carat") +
ylab("Price") +
facet_grid( cut ~ color)ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point(colour = "red", alpha = 0.3, size = 0.1) +
ggtitle("Diamonds Data") +
xlab("Carat") +
ylab("Price") +
facet_grid( cut ~ color)aesaes can either go inside the ggplot()
function, or inside the geom_[chart]() function itself, or
both. The 3 following options create the same plots, but the code is
slightly different.
#option 1
ggplot(data = diamonds, ) +
geom_point(aes(x = carat, y = price, color = clarity),
alpha = 0.2,
size = 1) +
geom_smooth(method = "glm" ,
formula = y ~ poly(x, 2), # y = b_0 + b_1 x + b_2 x^2 + e
aes(x = carat, y = price),
color = "red") +
ylim(c(0, 20000))#option 2
ggplot(data = diamonds, aes(x = carat, y = price, color = clarity)) +
geom_point(alpha = 0.2,
size = 1) +
geom_smooth(method = "glm" ,
formula = y ~ poly(x, 2), # y = b_0 + b_1 x + b_2 x^2 + e
aes(x = carat, y = price),
color = "red") +
ylim(c(0, 20000))#Option 3
ggplot(data = diamonds, aes(x = carat, y = price) )+
geom_point( aes(color = clarity),
alpha = 0.2,
size = 1) +
geom_smooth(method = "glm" ,
formula = y ~ poly(x, 2), # y = b_0 + b_1 x + b_2 x^2 + e
color = "red") +
ylim(c(0, 20000))I personally prefer to put “global” aesthetics in the
ggplot() and “local” aesthetics in the
geom.
x and ycolor = clarity is not needed for
geom_smoothgeom_point and geom_smooth use
x and y so I put them in the
ggplot()geom_point uses color = clarity so I
put that ONLY in the geom_point functionIn my opinion, Option 3 is the “cleanest” code. This is partly based on stylistic preference and partly based on some internal mechanic of ggplot’s (that is beyond the scope of this course). How you write your code is up to you. Just keep it readable!
But again, all 3 codes generate the the exact same plot (so does it really matter that much which option we use??)